Wednesday, February 18 2015

The Roadmap

  • Structure of spatial data in R
  • Reading / Writing Spatial Data
  • Visualizing Spatial Data
  • Analysing Spatial Data
  • Tools, Tips, Resources Roadmap

Explosion of Spatial Packages in R Recently

Get to know sp

sp slots mirror the structure of ESRI shapefiles

Mandatory Components Optional Components
.shp - actual geometry of feature .prj - CRS and projection info in WKT format
.shx - shapef index - binary file giving position index .sbn and .sbx- spatial indexing files
.dbf - attribute information, stored in dBase IV format .xml - metadata file

Get to know sp objects

library(rgdal)
data(nor2k)
class(nor2k)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
slotNames(nor2k)
[1] "data"        "coords.nrs"  "coords"      "bbox"        "proj4string"
plot(nor2k, axes=T, main='Peaks in Norway over 2000 meters')

Get to know sp objects

library(rgdal)
data(nor2k)
summary(nor2k)
Object of class SpatialPointsDataFrame
Coordinates:
           min     max
East    404700  547250
North  6804200 6910050
Height    2001    2469
Is projected: TRUE 
proj4string :
[+proj=utm +zone=32 +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0]
Number of points: 300
Data attributes:
      Nr.             Navn             Kommune         
 Min.   :  1.00   Length:300         Length:300        
 1st Qu.: 75.75   Class :character   Class :character  
 Median :150.50   Mode  :character   Mode  :character  
 Mean   :150.50                                        
 3rd Qu.:225.25                                        
 Max.   :300.00                                        

Get to know raster

  • sp has spatial grid and pixel classes, but raster package is best for raster data in R
  • Raster data structure divides region into rectangles / cells
  • Cells can store one or more values for the cells
library(raster)
r <- raster(ncol=10, nrow=10)
values(r) <- c(1:100)
plot(r, main='Raster with 100 cells')

Making data spatial

library(maps);library(sp);require(knitr)
data(us.cities)
knitr::kable(us.cities[1:5,])
name country.etc pop lat long capital
Abilene TX TX 113888 32.45 -99.74 0
Akron OH OH 206634 41.08 -81.52 0
Alameda CA CA 70069 37.77 -122.26 0
Albany GA GA 75510 31.58 -84.18 0
Albany NY NY 93576 42.67 -73.80 2
class(us.cities) # simple data frame

[1] "data.frame"

Making data spatial

Promote a data frame with coordinate to an sp SpatialPointsDataFrame

library(maps);library(sp)
data(us.cities)
coordinates(us.cities) <- c("long", "lat") 
class(us.cities) 

[1] "SpatialPointsDataFrame" attr(,"package") [1] "sp"

plot(us.cities, pch = 20, col = 'forestgreen', axes=T,
     xlim=c(-125,-70), ylim=c(26,55))

Maps package provides convenient stock maps

library(maps)
par(mfrow=c(1,1))
map()

map.text('county','oregon')
map.axes()
title(main="Oregon State")

Loading administrative backgrounds from Global Administrative Areas is another good option (http://gadm.org/)

Dealing with coordinate reference systems in R

  • CRS can be geographic (lat/lon), projected, or NA in R
  • Data with different CRS MUST be transformed to common CRS in R
  • Projections in sp are provided in PROJ4 strings in the proj4string slot of an object
  • http://www.spatialreference.org/

  • Useful rgdal package functions:
    • projInfo(type='datum')
    • projInfo(type='ellps')
    • projInfo(type='proj')

CRS

Dealing with coordinate reference systems in R

  • For sp classes:
    • To get the CRS: proj4string(x)
    • To assign the CRS:
      • Use either EPSG code or PROJ.4:
        • proj4string(x) <- CRS("init=epsg:4269")
        • proj4string(x) <- CRS("+proj=utm +zone=10 +datum=WGS84")
    • To transform CRS
      • x <- spTransform(x, CRS("+init=epsg:4238"))
      • x <- spTransform(x, proj4string(y))
  • For rasters:
    • To get the CRS: projection(x)
    • To transform CRS: projectRaster(x)

Reading and writing spatial data

  • Best method for reading and writing shapefiles is to use readOGR() and writeOGR() in rgdal
library(rgdal)
setwd('K:/GitProjects/RUserWebinar')
HUCs <- readOGR('.','OR_HUC08')
writeOGR(HUCs, '.','HUC', driver='ESRI Shapefile')
  • Best method for reading and writing rasters is raster package
library(raster)
r <- raster('clay.tif')
# crop it
r <- crop(r, extent(-1000000, 2000000, 100000, 3000000) )
writeRaster(r, 'clay_smaller.tif',format='GTiff')

Understanding slot structure

require(sp);require(rgeos);load("K:/GitProjects/RUserWebinar/Data.RData")
# A spatial PolygonsDataFrame has 5 slots:
str(HUCs, 2)
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 91 obs. of  3 variables:
  ..@ polygons   :List of 91
  ..@ plotOrder  : int [1:91] 78 1 32 87 81 24 26 11 38 3 ...
  ..@ bbox       : num [1:2, 1:2] -2297797 2191569 -1576981 2914667
  .. ..- attr(*, "dimnames")=List of 2
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot

Understanding slot structure

require(sp);require(rgeos);load("K:/GitProjects/RUserWebinar/Data.RData")
# Each polygon element has 5 of it's own slots - here we look at first one:
str(HUCs@polygons[[1]])[]
Formal class 'Polygons' [package "sp"] with 5 slots
  ..@ Polygons :List of 1
  .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. .. .. ..@ labpt  : num [1:2] -1805054 2272669
  .. .. .. ..@ area   : num 9.13e+09
  .. .. .. ..@ hole   : logi FALSE
  .. .. .. ..@ ringDir: int 1
  .. .. .. ..@ coords : num [1:13339, 1:2] -1782485 -1782433 -1782329 -1782239 -1782084 ...
  ..@ plotOrder: int 1
  ..@ labpt    : num [1:2] -1805054 2272669
  ..@ ID       : chr "0"
  ..@ area     : num 9.13e+09
NULL

Understanding slot structure

require(sp);require(rgeos); load("K:/GitProjects/RUserWebinar/Data.RData")
# Here we get a vector of area for features in HUCs spdf
area(HUCs[1:4,])
[1] 9128424113 2997942053 6065238017 4861449846
# Total Area - gArea function in rgeos gives same result
sum(area(HUCs))
[1] 312620322426
# Area of a particular feature
HUCs@polygons[[1]]@Polygons[[1]]@area
[1] 9128424113

Getting areas of polygons

require(sp);load("K:/GitProjects/RUserWebinar/Data.RData")
plot(HUCs, axes=T, main='HUCs in Oregon')

# Function to calculate percent of area
AreaPercent <- function(x) {
  tot_area <- sum(sapply(slot(x, "polygons"),
                         slot, "area"))
  sapply(slot(x, "polygons"), slot, 
         "area") / tot_area * 100
}  

Getting areas of polygons

  • Highlight larger HUCs using Area function
require(sp);load("K:/GitProjects/RUserWebinar/Data.RData")
plot(HUCs, axes=T, main='HUCs in Oregon')
# Function to calculate percent of area
AreaPercent <- function(x) {
  tot_area <- sum(sapply(slot(x, "polygons"),
                         slot, "area"))
  sapply(slot(x, "polygons"), slot, 
         "area") / tot_area * 100
}  
# just plot bigger HUCs
plot(HUCs[AreaPercent(HUCs) > 1,], add=T, 
     col='Steel Blue')

Spatial Operations on vector data

require(ggplot2);load("K:/GitProjects/RUserWebinar/Data.RData")
# Take a look at some USGS stream gages for PNW
gages@data[1:5,5:8]
##    LON_SITE LAT_SITE    NearCity      AVE
## 1 -123.3178 42.43040 Grants Pass 3402.544
## 2 -122.6011 42.42763 Eagle Point   60.201
## 3 -119.9233 42.42488    Altamont   34.272
## 4 -122.6011 42.40819 Eagle Point  104.177
## 5 -122.5373 42.40263 Eagle Point   72.511
# Explicitly set CRS for layers
gages <- spTransform(gages, CRS('+init=epsg:2991'))
# Locations of gages
ggplot(gages@data, aes(LON_SITE, LAT_SITE)) + 
  geom_point(aes(color=log10(AVE)))

Spatial Operations on vector data

Spatial Indexing - Select just gages within certain HUCs

load("K:/GitProjects/RUserWebinar/Data.RData")
gages_proj <- spTransform(gages, CRS('+init=epsg:2991'))
HUCs_proj <- spTransform(HUCs, CRS('+init=epsg:2991'))
HUCs_proj$LON <- coordinates(HUCs_proj)[,1]
HUCs_west <- HUCs_proj[HUCs_proj$LON < 400000, ]
options(scipen=3)
plot(gages_proj, axes=T, col='blue')
plot(HUCs_west, add=T)

## Spatial Operations on vector data {.build .smaller .columns-2} ### Spatial Indexing - Select just gages within certain HUCs - Just index to subset spatially

load("K:/GitProjects/RUserWebinar/Data.RData")
gages_west <- gages_proj[HUCs_west,]
plot(HUCs_west, axes=T)
plot(gages_west, add=T, col='blue')

Spatial Operations on vector data

Overlay and Aggregation - What HUCs are gages located in?

load("K:/GitProjects/RUserWebinar/Data.RData")
OverUpdate <- function(points, polys) {
  pointpoly <- over(points, polys)
  points@data <- data.frame(points@data, pointpoly)
}
gages_proj@data <- OverUpdate(gages_proj, HUCs_proj)
head(gages_proj@data[,c(3,5:6,12)])
##                                      STATION_NM  LON_SITE LAT_SITE
## 1                ROGUE RIVER AT GRANTS PASS, OR -123.3178 42.43040
## 2   N F LTL BUTE CR AB INTKE CANL LKECREEK OREG -122.6011 42.42763
## 3                  HONEY CREEK NEAR PLUSH,OREG. -119.9233 42.42488
## 4 SOUTH FORK LITTLE BUTTE CR NR LAKECREEK,OREG. -122.6011 42.40819
## 5      NO FK LITTLE BUTTE CR NR LAKECREEK,OREG. -122.5373 42.40263
## 6             TWELVEMILE CREEK NEAR PLUSH,OREG. -120.0177 42.38322
##      HUC_8
## 1 17100308
## 2 17100307
## 3 17120007
## 4 17100307
## 5 17100307
## 6 17120007

Spatial Operations on vector data

Overlay and Aggregation - What is mean flow of gages within each HUC?

load("K:/GitProjects/RUserWebinar/Data.RData")
HUCs_proj$StreamFlow <- over(HUCs_proj,gages_proj[8],fn = mean)
head(HUCs_proj@data[!is.na(HUCs_proj@data$StreamFlow),])
      HUC_8 SUM_ACRES      LON       AVE
2  17050103   1498689 716737.1  32.78000
5  17050107    956848 669298.4 930.73000
8  17050110   1264239 631925.6 622.90075
10 17050116   1554027 572126.3 130.40100
11 17050117    607018 633179.4 304.05640
12 17050118    375002 617941.8  31.72333
# Or median..
HUCs_proj$StreamFlow <- over(HUCs_proj,gages_proj[8],fn = sum)
head(HUCs_proj@data[!is.na(HUCs_proj@data$StreamFlow),])
      HUC_8 SUM_ACRES      LON      AVE
2  17050103   1498689 716737.1   32.780
5  17050107    956848 669298.4  930.730
8  17050110   1264239 631925.6 2491.603
10 17050116   1554027 572126.3  652.005
11 17050117    607018 633179.4 1520.282
12 17050118    375002 617941.8   95.170

Attribute joining

require(plyr);load("K:/GitProjects/RUserWebinar/Data.RData")
head(cities[,c(3:4, 7:8)])
          NAME ST_ABBREV POP_2000 POP2007
1      Astoria        OR     9813    9901
2    Warrenton        OR     4096    4413
3     Gearhart        OR      995    1033
4      Seaside        OR     5900    5982
5   Clatskanie        OR     1528    1664
6 Cannon Beach        OR     1588    1669
# We can use match or join to connect to our spatial gages 
gages$POP2007 <- cities$POP2007[match(gages$NearCity, cities$NAME)]
# OR set a common name field and use join from plyr
names(gages)[7] <- "NAME"
gages@data <- join(gages@data, cities)
head(gages@data[,c(3,7,17:19)])
                                     STATION_NM        NAME PLACE_FIPS
1                ROGUE RIVER AT GRANTS PASS, OR Grants Pass      30550
2   N F LTL BUTE CR AB INTKE CANL LKECREEK OREG Eagle Point      21550
3                  HONEY CREEK NEAR PLUSH,OREG.    Altamont      01850
4 SOUTH FORK LITTLE BUTTE CR NR LAKECREEK,OREG. Eagle Point      21550
5      NO FK LITTLE BUTTE CR NR LAKECREEK,OREG. Eagle Point      21550
6             TWELVEMILE CREEK NEAR PLUSH,OREG.    Altamont      01850
  POP_2000      STATUS
1    23003 County Seat
2     4797        <NA>
3   -99999        <NA>
4     4797        <NA>
5     4797        <NA>
6   -99999        <NA>

Spatial operations on vector data

Dissolving

library(rgeos); library(rgdal)
# Use gUnaryUnion from rgeos to dissolve polygons
# http://www.maths.lancs.ac.uk/~rowlings/Teaching/Sheffield2013/spatialops.html
#create four bins of longitude values using coordinate data from HUCs
lps <- coordinates(HUCs)
IDFourBins <- cut(lps[,1], quantile(lps[,1]), 
                  include.lowest=TRUE)
regions = gUnaryUnion(HUCs,IDFourBins)
regions = SpatialPolygonsDataFrame(regions,
                    data.frame(regions = c('Coastal',
                    'Mountains','High Desert','Eastern')),
                    match.ID = FALSE)
plot(regions, axes=T)
text(coordinates(regions), 
     label = regions$regions, cex=.8)

Working with rasters

require(raster)
clay <- raster('clay.tif')
inMemory(clay)
[1] FALSE
cellStats(clay, min); cellStats(clay, max)
[1] 0
[1] 73.6665
projection(clay)
[1] "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
plot(clay)

Working with rasters

require(raster)
clay <- raster('clay.tif')
clay_OR <- crop(clay, extent(-2261000, -1594944, 2348115, 2850963))
plot(clay_OR)

gages <- spTransform(gages, CRS(projection(clay)))
gages$clay = extract(clay,gages)
head(gages@data[,c(3,12)])
##                                      STATION_NM POP2007
## 1                ROGUE RIVER AT GRANTS PASS, OR   24753
## 2   N F LTL BUTE CR AB INTKE CANL LKECREEK OREG    6819
## 3                  HONEY CREEK NEAR PLUSH,OREG.   20493
## 4 SOUTH FORK LITTLE BUTTE CR NR LAKECREEK,OREG.    6819
## 5      NO FK LITTLE BUTTE CR NR LAKECREEK,OREG.    6819
## 6             TWELVEMILE CREEK NEAR PLUSH,OREG.   20493

Visualizing data

RasterVis

library(rasterVis)
alt <- getData('worldclim', var='alt', res=2.5)
usa1 <- getData('GADM', country='USA', level=1)
oregon <- usa1[usa1$NAME_1 == 'Oregon',]
alt <- crop(alt, extent(oregon) + 0.5)
alt <- mask(alt, oregon)
levelplot(alt, par.settings=GrTheme)

Visualizing data

PlotGoogleMaps

require(plotGoogleMaps)
HUCs <- spTransform(HUCs, CRS('+init=epsg:28992'))
map <- plotGoogleMaps(HUCs, filename='RGoogleMapsExample.htm')
Roadmap

Visualizing data using ggmap

library(ggmap)
mymap <- get_map(location = "Corvallis, OR", source="google", maptype="terrain",zoom = 12)
ggmap(mymap, extent="device")

Visualizing data

Taking it a step further with ggmap

Roadmap

Some things still better in python

R Markdown + knitr package

Roadmap

Combining R and Python - the best of both worlds

R Markdown + knitr package

Roadmap

Resources